Release Date: 06/01/2020 Due Date: 06/10/2020 – 6:30pm

CSE 141 ­– Homework 5 (15 pts): Hazards and Caches

# **(1 pt)** In the following program, identify all types of data dependency (i.e. RAW, WAR, WAW). **1** add r1 r2 r3 **2** sub r7 r1 r8 **3** mul r1 r5 r6 Your answer format should be: “<dependency type> dependency from line <line no.> to line <line no.>”

# **RAW dependency from line 1 to line 2**

# **WAR dependency from line 2 to line 3**

# **WAW dependency from line 1 to line 3**

# **(3 pts)** Suppose we have a 5-stage pipeline processor: Fetch 🡪 Decode 🡪 Execute 🡪 Memory 🡪 Write Back. Also suppose we have the following program: **1** bne r1 r0 label **2** add r4 r5 r6 **3** sub r5 r4 r3 **4** label: mul r1 r2 r3 **5** ld r1 0(r1) **6** add r1 r1 r1 a) **(1 pt)** Suppose the branch in line 1 is supposed to be taken (i.e. the next instruction to fetch is the one in line 4, not the one in line 2). Also suppose there is no branch predictor. Then, we notice that there is a control hazard, where the processor fetches the two instructions in lines 2 and 3. In this case, choose the appropriate action that the processor should take:

# **Flush** b) **(1 pt)** Now consider lines 4 and 5. Note that there is a data dependence (RAW) between the two instructions. In this case, choose the appropriate action that the processor should take:

# **Forward** c) **(1 pt)** Now consider lines 5 and 6. Note that there is a data dependence (RAW) between the two instructions. In this case, choose the appropriate action that the processor should take. Note that more than one choice could be valid. **Forward,** **Stall**

# **(2 pts)** Compute the branch prediction accuracy for the following two branch histories for the 1-bit and 2-bit branch predictors. Further, the 1-bit predictor starts in the ’Not-Taken’ and the 2-bit predictor starts in the ’Weakly-Taken’ state at the beginning of each branch trace below. Given this information, what would the accuracy be for each predictor type on the following two branch traces. Provide your solution inline below, but show any additional work in the solution box. a) **(1 pt)** Trace 1: *T T N T T N T N T N T*

# **1-**bit 0 1 1 0 1 1 0 1 0 1 0 (shift 1 right)

# **X o x x o x x x x x x**

# **2-bit**

# **10 11 11 10 11 11 10 11 10 11 10**

# **O o x o o x o x o x o**

# 1-bit accuracy is: 2/11 2-bit accuracy is: 7/11 b) **(1 pt)** Trace 1: *N N T N N T N N T N T T T* 1-bit accuracy is: \_\_\_\_\_\_\_ % 2-bit accuracy is: \_\_\_\_\_\_\_ %

# **(4 Points)** Cache Layout: A processor has a separate D-cache and an I-cache.

# D-cache: 64KB, 4-way set associative, block size of 1 word, write-back policy

# I-cache: 32KB, direct mapped cache, block size of 1 word

# The processor uses the LRU algorithm for its replacement policy. Answer the following questions.

# Make sure that you account for all the book-keeping bits.

# Calculate the number of tag, index and offset bits for the D-cache.

# **18 tag bits** 32-12-2

# **12 Index bits** log2(64\*1024/(4\*4))

# **2 offset bits** log24

# Calculate the number of tag, index and offset bits for the I-cache.

# How many bits are needed to implement the D-cache?

# **54 bits per block**

# How many bits are needed to implement the I-cache?

# **(5 Points)** Performance: In this question we will compare the performance of two processors using a certain benchmark.

# 

# We are also given that accesses to the main memory takes 90ns. Additionally Processor 2 has to execute 15% more dynamic instructions than Processor 1. It is also seen that only 20% of the instructions access memory (i.e, lw and sw instructions). Profiling the code also showed us that 15% of the instructions were branch instructions. Both processors support branch prediction with 90% of the branches being predicted accurately and mis-predicts with P1 having a 1 cycle penalty and P2 having a 2 cycle penalty. Assume that we have no data hazards.

# If L1 Hit time determines the cycle time then what is the clock rate of P1 and P2 ?

# **1/.5 ns P1**

# **2 ghz P1, 1.11ghz P2**

# Calculate how many cycles it takes to access data memory for both the processors.

# **P1: 90ns/0.5ns = 180 cycles**

# **P2: 90ns/0.9ns = 100 cycles**

# Estimate the MCPI of the benchmark for both the processors.

# Hint : You need to consider both instruction and data memory

# **Accesses/inst \* miss rate \* miss penalty**

# **P1: 1\*0.02\*180 + 0.2\*0.08\*180**

# **P2:**

# Estimate the BCPI (Contribution due to branches) for both the processors

# **P1: %branch instructions \* mispredictions \* cycles**

# **0.15 \* 0.10 \* 1 = 0.015**

# **P2:**

# Considering just this benchmark, which of these processors is faster? (Assume Base CPI = 1.0 for both) Hint : Use execution time

# TCPI = BCPI + MCPI

# P1 execution time = IC \* TCPI \* CT

**Submission Instructions (Read Carefully):**

1. Please type your responses in a separate document and submit the PDF file to gradescope. ONLY typed PDF documents will be accepted as a valid submission. However, if there are any questions that ask you to draw a diagram, those can be hand-drawn, but must be attached to the final PDF file. Please do not submit a separate file for hand-drawn diagrams. Only one, final PDF file will be accepted.

2. If you have any questions about the homework problems, please post your question on piazza.

3. No late submissions allowed!